REINCLUDE SELF ESTEEM VAHEY STUDY? check clinical exclusions
# dependencies
library(tidyverse)
library(pwr)
library(irr)
library(metafor)
library(forestploter)
library(janitor)
library(greekLetters)
library(knitr)
library(kableExtra)
dir.create("models")
dir.create("plots")
# function to round all numeric vars in a data frame
round_df <- function(df, n_digits = 3) {
df %>% mutate_if(is.numeric, janitor::round_half_up, digits = n_digits)
}
# apa format p value -----
apa_p_value <- function(p){
p_formatted <- ifelse(p >= 0.001, paste("=", round_half_up(p, 3)),
ifelse(p < 0.001, "< .001", NA))
p_formatted <- gsub(pattern = "0.", replacement = ".", x = p_formatted, fixed = TRUE)
p_formatted
}
# meta results string
meta_effect_string <- function(fit, predictions){
paste0("k = ", fit$k, ", r = ", predictions$pred,
", 95% CI [", predictions$ci.lb, ", ", predictions$ci.ub, "]",
", 95% CR [", predictions$pi.lb, ", ", predictions$pi.ub, "]",
", p = ", signif(2*pnorm(-abs(fit$zval)), digits = 1)) # exact p value from z score
}
# heterogeneity strings
# heterogeneity_string <- function(fit){
# #"Q(df=", fit$k - 1, ") = ", janitor::round_half_up(fit$QE, 2),
# # ", p ", ifelse(fit$QEp < 0.001, "< .001", paste0("= ", as.character(janitor::round_half_up(fit$QEp, 3)))),
# # ", ",
# if(!is.null(fit$I2)){
# heterogeneity_string <-
# paste0("Heterogeneity: ", greekLetters::greeks("tau^2"), " = ", janitor::round_half_up(fit$tau2, 2),
# ", ", greekLetters::greeks("I^2"), " = ", janitor::round_half_up(fit$I2, 2),
# "%, ", greekLetters::greeks("H^2"), " = ",janitor::round_half_up(fit$H2, 2))
# } else {
# heterogeneity_string <-
# paste0("Heterogeneity: ", greekLetters::greeks("tau^2"), " = ", janitor::round_half_up(fit$tau2, 2))
# }
# return(heterogeneity_string)
# }
#
# footnote = heterogeneity_string(fit_new)
# notation off
options(scipen = 999)
# plot theme
custom_theme <-
forest_theme(base_size = 10,
ci_lty = 1,
ci_lwd = 1.5,
ci_Theight = 0.2,
summary_fill = "black",
summary_col = "black",
footnote_cex = 0.8)
# get data
data_vahey <- read.csv("../data/data_vahey_et_al_2015.csv")Calculated from meta analytic effect sizes reported in forest plot and text.
Vahey et al.’s reported meta effect size estimate was r = .45, 95% CI [.40, .54], 95% CR [.23, .67]. Using this effect size, they conducted a power analysis.
I used the R package pwr to attempt to reproduce these values.
However, Vahey et al.’s analytic choices here could be questioned: One-tailed correlations tests with alpha = .05 are very uncommon when reporting the significance of correlations, and regression analyses require two-sided testing. A two-tailed test with alpha = .05 would more correspond far more closely to modal research practices. I therefore recomputed sample size estimates using these parameters:
Calculated from weighted average effect sizes reported in forest plot.
The power analyses relied on the accuracy of the meta-analytic effect size. So, I then attempted to computationally reproduce the meta-analytic effect size from the effect sizes, confidence intervals, credibility intervals, and sample sizes reported in Vahey et al.’s forest plot and their description of their meta-analytic method in their manuscript.
In an email, Vahey stated that they performed a Hunter & Schmidt meta analysis, following the guide of Field & Gillett (2010) and using their code.
On inspection, Field & Gillett (2010) make a distinction between the Hunter & Schmidt method, which is distinctive for reporting credibility intervals rather than confidence intervals, and the Hedges and colleagues’ method, which is distinctive for using Fisher’s r-to-z transformations.
Vahey et al. (2015) do not report applying any transformations to their data. However, Vahey et al.’s (2015) individual effect sizes in their forest plot have asymmetric confidence intervals. This implies that a transformation was performed. It is therefore possible that Vahey et al. combined the two methods (and code) provided by Field & Gillett (2010). I first fit a Hunter & Schmidt method, then a combined Hunter & Schmidt and Hedges’ and colleagues method.
See “computational replication of meta-analysis via SPSS script” folder.
Summary of results:
results_spss <- tibble(
k_studies = c(15, 15),
total_n = c(494, 494),
pred = c(.45, .4670),
ci.lb = c(.40, .1955),
ci.ub = c(.54, .7392),
pi.lb = c(.23, .4674),
pi.ub = c(.67, .4674)
)%>%
round_df(2)
results_spss %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| k_studies | total_n | pred | ci.lb | ci.ub | pi.lb | pi.ub |
|---|---|---|---|---|---|---|
| 15 | 494 | 0.45 | 0.4 | 0.54 | 0.23 | 0.67 |
| 15 | 494 | 0.47 | 0.2 | 0.74 | 0.47 | 0.47 |
Information about implementation of Hunter & Schmidt-style meta-analysis in metafor from here.
total_n <- data_vahey %>%
distinct(article, n_forest_plot) %>%
summarize(total_n = sum(n_forest_plot, na.rm = TRUE)) %>%
pull(total_n)
# calculate variances
data_recalculated_variance <-
escalc(measure = "COR",
ri = mean_r_forest_plot_numeric,
ni = n_forest_plot,
data = data_vahey,
vtype = "AV") %>%
drop_na() %>%
select(article, article_abbreviated,
yi, vi, ni = n_forest_plot) %>%
mutate(lower = yi - sqrt(vi)*1.96,
upper = yi + sqrt(vi)*1.96)
# fit model
fit_reproduced <-
rma(yi = yi,
vi = vi,
weights = ni, # Hunter Schmidt method requires weighting by N
method = "HS", # Hunter Schmidt method
data = data_recalculated_variance,
slab = article_abbreviated)
predictions_reproduced <-
predict(fit_reproduced) %>%
as.data.frame() %>%
select(-se)
predictions_reproduced_printing <- predictions_reproduced %>%
round_df(2)
# predictions_reproduced %>%
# round_df(2) %>%
# kable() %>%
# kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
# plot
data_forest <- data_recalculated_variance %>%
drop_na() %>%
select(article_abbreviated,
n = ni,
r = yi,
lower,
upper) %>%
# bind_rows(tibble(article_abbreviated = "Meta",
# n = 35, # arbitrary number to make the diamond an appropriate size
# r = predictions_reproduced$pred,
# lower = predictions_reproduced$pi.lb,
# upper = predictions_reproduced$pi.ub)) %>%
bind_rows(tibble(article_abbreviated = c("Meta (confidence interval)", "Meta (credibility interval)"),
n = 35, # arbitrary number to make the diamond an appropriate size
r = c(predictions_reproduced$pred, predictions_reproduced$pred),
lower = c(predictions_reproduced$ci.lb, predictions_reproduced$pi.lb),
upper = c(predictions_reproduced$ci.ub, predictions_reproduced$pi.ub))) %>%
mutate(size = n/sum(n),
n = ifelse(str_detect(article_abbreviated, "Meta"), total_n, n),
article_abbreviated = fct_relevel(article_abbreviated,
"Carpenter et al. (2012)",
"Dawson et al. (2009)",
"Hooper et al, (2010)",
"Hussey & Barnes-Holmes (2012)",
"Kishita et al. (2014)",
"Kosnes et al. (2013)",
"Nicholson & Barnes-Holmes (2012a)",
"Nicholson & Barnes-Holmes (2012b)",
"Nicholson, Dempsey et al. (2013)",
"Nicholson, McCourt et al. (2013)",
"Parling et al. (2011)",
"Remue et al. (2013)",
"Timko et al. (2010, Study 1)",
"Vahey et al. (2009)",
"Vahey et al. (2010)",
"Meta")) %>%
mutate(` ` = paste(rep(" ", 30), collapse = " ")) %>%
select(Article = article_abbreviated, ` `, r, lower, upper, n, size) %>%
round_df(2)
p_reproduced <-
forestploter::forest(data = select(data_forest, -size),
est = data_forest$r,
lower = data_forest$lower,
upper = data_forest$upper,
sizes = 1/data_forest$size,
#is_summary = c(rep(FALSE, nrow(data_forest)-1), TRUE),
is_summary = c(rep(FALSE, nrow(data_forest)-2), TRUE, TRUE),
ci_column = 2,
ref_line = 0,
theme = custom_theme,
xlim = c(-0.3, 1.1),
ticks_at = c(-.2, 0, .2, .4, .6, .8, 1.0))
p_reproducedggplot2::ggsave(filename = "plots/forest plot Hunter & Schmidt method.pdf",
plot = p_reproduced,
device = "pdf",
width = 7.5,
height = 4.5,
units = "in")Meta effect: k = 15, r = 0.47, 95% CI [0.4, 0.54], 95% CR [0.4, 0.54], p = 0.000000000000000000000000000000000000004.
Fisher’s r-to-z transformation and back transformation, with HS estimator. No Overton (1998) transformation.
# calculate variances
data_recalculated_variance_transformed <-
escalc(measure = "ZCOR",
ri = mean_r_forest_plot_numeric,
ni = n_forest_plot,
data = data_vahey,
vtype = "AV") %>%
drop_na() %>%
select(article, article_abbreviated,
yi, vi, ni = n_forest_plot) %>%
mutate(lower = yi - sqrt(vi)*1.96,
upper = yi + sqrt(vi)*1.96)
# fit model
fit_reproduced_transformed <-
rma(yi = yi,
vi = vi,
weights = ni, # Hunter Schmidt method requires weighting by N
method = "HS", # Hunter Schmidt method
data = data_recalculated_variance_transformed,
slab = article_abbreviated)
predictions_reproduced_transformed <-
predict(fit_reproduced_transformed) %>%
as.data.frame() %>%
select(-se)
predictions_reproduced_transformed_backtransformed <- predictions_reproduced_transformed %>%
mutate_all(transf.ztor) %>%
round_df(2)
# plot
data_forest_transformed <- data_recalculated_variance_transformed %>%
select(article_abbreviated,
n = ni,
r = yi,
lower,
upper) %>%
# bind_rows(tibble(article_abbreviated = "Meta",
# n = 35, # arbitrary number to make the diamond an appropriate size
# r = predictions_reproduced_transformed$pred,
# lower = predictions_reproduced_transformed$pi.lb,
# upper = predictions_reproduced_transformed$pi.ub)) %>%
bind_rows(tibble(article_abbreviated = c("Meta (confidence interval)", "Meta (credibility interval)"),
n = 35, # arbitrary number to make the diamond an appropriate size
r = c(predictions_reproduced_transformed$pred, predictions_reproduced_transformed$pred),
lower = c(predictions_reproduced_transformed$ci.lb, predictions_reproduced_transformed$pi.lb),
upper = c(predictions_reproduced_transformed$ci.ub, predictions_reproduced_transformed$pi.ub))) %>%
mutate(r = transf.ztor(r),
lower = transf.ztor(lower),
upper = transf.ztor(upper)) %>%
mutate(size = n/sum(n),
n = ifelse(str_detect(article_abbreviated, "Meta"), total_n, n),
article_abbreviated = fct_relevel(article_abbreviated,
"Carpenter et al. (2012)",
"Dawson et al. (2009)",
"Hooper et al, (2010)",
"Hussey & Barnes-Holmes (2012)",
"Kishita et al. (2014)",
"Kosnes et al. (2013)",
"Nicholson & Barnes-Holmes (2012a)",
"Nicholson & Barnes-Holmes (2012b)",
"Nicholson, Dempsey et al. (2013)",
"Nicholson, McCourt et al. (2013)",
"Parling et al. (2011)",
"Remue et al. (2013)",
"Timko et al. (2010, Study 1)",
"Vahey et al. (2009)",
"Vahey et al. (2010)",
"Meta")) %>%
mutate(` ` = paste(rep(" ", 30), collapse = " ")) %>%
select(Article = article_abbreviated, ` `, r, lower, upper, n, size) %>%
round_df(2)
p_reproduced_transformed <-
forestploter::forest(data = select(data_forest_transformed, -size),
est = data_forest_transformed$r,
lower = data_forest_transformed$lower,
upper = data_forest_transformed$upper,
sizes = 1/data_forest_transformed$size,
#is_summary = c(rep(FALSE, nrow(data_forest)-1), TRUE),
is_summary = c(rep(FALSE, nrow(data_forest)-2), TRUE, TRUE),
ci_column = 2,
ref_line = 0,
theme = custom_theme,
xlim = c(-0.3, 1.1),
ticks_at = c(-.2, 0, .2, .4, .6, .8, 1.0))
p_reproduced_transformedggplot2::ggsave(filename = "plots/forest plot Hunter & Schmidt method with Fisher's r-to-z transformations.pdf",
plot = p_reproduced_transformed,
device = "pdf",
width = 7.5,
height = 4.5,
units = "in")Meta effect: k = 15, r = 0.47, 95% CI [0.4, 0.54], 95% CR [0.4, 0.54], p = 0.000000000000000000000000002.
bind_rows(
results_spss %>% select(-k_studies, -total_n),
predictions_reproduced_printing,
predictions_reproduced_transformed_backtransformed
) %>%
rownames_to_column(var = "analysis") %>%
mutate(analysis = case_when(analysis == "...1" ~ "Vahey et al.",
analysis == "...2" ~ "Attempted computational reproduction: Field & Gillett's implementation of Hunter & Schmidt",
analysis == "...3" ~ "Attempted computational reproduction: Vichtbauer's implementation of Hunter & Schmidt",
analysis == "...4" ~ "Attempted computational reproduction: Vichtbauer's implementation of Hunter & Schmidt + Fisher's r-to-z transformations")) %>%
rename(estimate = pred, ci_lower = ci.lb, ci_upper = ci.ub, cr_lower = pi.lb, cr_upper = pi.ub) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| analysis | estimate | ci_lower | ci_upper | cr_lower | cr_upper |
|---|---|---|---|---|---|
| Vahey et al. | 0.45 | 0.4 | 0.54 | 0.23 | 0.67 |
| Attempted computational reproduction: Field & Gillett’s implementation of Hunter & Schmidt | 0.47 | 0.2 | 0.74 | 0.47 | 0.47 |
| Attempted computational reproduction: Vichtbauer’s implementation of Hunter & Schmidt | 0.47 | 0.4 | 0.54 | 0.40 | 0.54 |
| Attempted computational reproduction: Vichtbauer’s implementation of Hunter & Schmidt + Fisher’s r-to-z transformations | 0.47 | 0.4 | 0.54 | 0.40 | 0.54 |
I report three attempts to reproduce Vahey et al.’s results using (a) Field & Gillett’s (2010) SPSS script for a Hunter & Schmidt style meta-analysis, which Vahey referred me to use to reproduce their results, (b) a close implementation of the same Hunter & Schmidt style meta-analysis in R using the well-established metafor package, and (c) the same Hunter& Schmidt meta-analysis in r/metafor adding the transformations Field & Gillett recommended for a different analytic strategy (i.e., Hedges’ and colleagues use of Fisher’s r-to-z transformations).
None of the three analytic strategies/implementations fully reproduced Vahey et al.’s results.
All three return the same meta-estimate of effect size: r = .47, where Vahey et al reported r = .45.
Both of the metafor implementations produce the same confidence intervals as reported in Vahey et al, but Field & Gillett’s SPSS script produces very different ones.
Only the analysis that included Fishers’ r-to-z transformations can account for and replicate the asymmetric confidence intervals on individual effect sizes that Vahey et al. reported in their forest plot. At minimum, it appears that Vahey et al. employed (and did not report using) a mixture of the Hunter & Schmidt method and the Hedges’ method of transformation.
No implementation produced similar credibility intervals to Vahey et al.
Reported in forest plot, calculated from individual effect sizes in supplementary materials.
Next, I attempted to reproduce the weighted-mean effect sizes reported in Vahey et al.’s forest plot from the individual effect sizes they reported in their supplementary materials.
Vahey et al. reported that they weighted by degrees of freedom, and I do the same.
I noted that some of the confidence intervals in Vahey et al.’s forest plot are asymmetrical around the point estimate. This is uncommon and not accounted for by Vahey et al. detailing of how they calculated the effect sizes and their confidence intervals. However, I take them at face value as they are the most detailed data available to work from.
# recalculate and compare
data_weighted_effect_sizes <- data_vahey %>%
select(article = article_abbreviated,
r_supplementary,
df_supplementary) %>%
group_by(article) %>%
mutate(mean_df = mean(df_supplementary)) %>%
ungroup() %>%
mutate(r_weighted_by_df = r_supplementary*(df_supplementary/mean_df)) %>%
group_by(article) %>%
summarize(r_recalculated_weighted_mean = round_half_up(mean(r_weighted_by_df), 2)) %>%
left_join(data_vahey %>%
select(article = article_abbreviated,
mean_r_forest_plot_numeric) %>%
drop_na(),
by = "article") %>%
mutate(congruence = ifelse(janitor::round_half_up(r_recalculated_weighted_mean, 2) ==
janitor::round_half_up(mean_r_forest_plot_numeric, 2), "Congruent", "Incongruent"),
congruence = fct_relevel(congruence, "Congruent", "Incongruent"),
difference = r_recalculated_weighted_mean - mean_r_forest_plot_numeric)
# plot
p_comparison_reaveraged <-
ggplot(data_weighted_effect_sizes, aes(mean_r_forest_plot_numeric, r_recalculated_weighted_mean)) +
geom_abline(slope = 1, linetype = "dotted") +
geom_point(aes(color = congruence, shape = congruence), size = 2.25) +
theme_classic() +
scale_color_viridis_d(begin = 0.25, end = 0.75) +
theme(legend.position = "top", # c(0.25, 0.85),
legend.title = element_blank()) +
#legend.box.background = element_rect(colour = "black")) +
xlim(0.35, 0.75) +
ylim(0.35, 0.75) +
xlab("Reported in Vahey et al.'s (2015)\nforest plot") +
ylab("Recalculated from Vahey et al.'s (2015)\nsupplementary materials") +
annotate("text", x = .66, y = .45, size = 3, label = "Overestimates validity") +
annotate("text", x = .45, y = .66, size = 3, label = "Underestimates valdiity")
p_comparison_reaveragedggplot2::ggsave(filename = "plots/scatter plot comparing effect sizes in forest plot to reaveraged from supplementary materials.pdf",
plot = p_comparison_reaveraged,
device = "pdf",
width = 4.5,
height = 4.5,
units = "in")
# table
data_weighted_effect_sizes %>%
summarize(n_incongruent = sum(ifelse(congruence == "Incongruent", TRUE, FALSE)),
percent_incongruent = round_half_up(mean(ifelse(congruence == "Incongruent", TRUE, FALSE)), 2)*100) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| n_incongruent | percent_incongruent |
|---|---|
| 2 | 13 |
data_weighted_effect_sizes %>%
filter(difference != 0) %>%
select(article, r_recalculated_weighted_mean, mean_r_forest_plot_numeric, difference, congruence) %>%
round_df(2) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| article | r_recalculated_weighted_mean | mean_r_forest_plot_numeric | difference | congruence |
|---|---|---|---|---|
| Nicholson, Dempsey et al. (2013) | 0.46 | 0.48 | -0.02 | Incongruent |
| Vahey et al. (2009) | 0.58 | 0.53 | 0.05 | Incongruent |
The weighted effect sizes reported in Vahey et al.’s forest plot could not be computationally reproduced from the individual effect sizes and degrees of freedom reported in their supplementary materials.
Many values were congruent, but a minority differed (k = 2 [13%]). Table includes those that differed.
Attempted to reextract the individual effect sizes listed in the supplementary materials and convert them to Pearson’s r.
Note that only a subset were possible to reextract/convert, for the following reasons:
data_reextracted_and_annotated <- read.csv("../data/data_reextracted_and_annotated.csv")
# get effect sizes
data_individual_effect_sizes <- data_reextracted_and_annotated %>%
# consider only those effect sizes employed in vahey et al.'s meta-analysis
filter(es_included_in_vahey_meta == TRUE) %>%
# create an simplified effect size identifier
group_by(article) %>%
mutate(es_number = row_number()) %>%
ungroup() %>%
select(article, es_number, r_vahey, r_from_paper, n_from_paper)
# table
data_individual_effect_sizes %>%
summarize(n_reextracted = sum(!is.na(r_from_paper)),
percent_reextracted = round_half_up(sum(!is.na(r_from_paper))/n(), 3)*100) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| n_reextracted | percent_reextracted |
|---|---|
| 29 | 51.8 |
# assess congruence
data_individual_effect_sizes_congruence <- data_individual_effect_sizes %>%
na.omit() %>%
mutate(congruence = ifelse(round_half_up(r_from_paper, 2) == round_half_up(r_vahey, 2), "Congruent", "Incongruent"),
congruence = fct_relevel(congruence, "Congruent", "Incongruent"),
difference = round_half_up(r_from_paper - r_vahey, 2))
# plot
p_comparison_reextracted <-
ggplot(data_individual_effect_sizes_congruence, aes(r_vahey, r_from_paper)) +
geom_abline(slope = 1, linetype = "dotted") +
geom_point(aes(color = congruence, shape = congruence), size = 2.25) +
theme_classic() +
scale_color_viridis_d(begin = 0.25, end = 0.75) +
theme(legend.position = "top", # c(0.25, 0.85),
legend.title = element_blank()) +
#legend.box.background = element_rect(colour = "black")) +
xlim(0, 0.7) +
ylim(0, 0.7) +
xlab("Reported in Vahey et al.'s (2015)\nsupplementary materials") +
ylab("Reextracted from original articles\n") +
annotate("text", x = .54, y = .175, size = 3, label = "Overestimates validity") +
annotate("text", x = .175, y = .54, size = 3, label = "Underestimates valdiity")
p_comparison_reextractedggplot2::ggsave(filename = "plots/scatter plot comparing effect sizes in supplementary materials to reextracted effect sizes.pdf",
plot = p_comparison_reextracted,
device = "pdf",
width = 4.5,
height = 4.5,
units = "in")
# table
data_individual_effect_sizes_congruence %>%
summarize(n_incongruent = sum(ifelse(congruence == "Incongruent", TRUE, FALSE)),
percent_incongruent = round_half_up(mean(ifelse(congruence == "Incongruent", TRUE, FALSE)), 2)*100) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| n_incongruent | percent_incongruent |
|---|---|
| 13 | 45 |
data_individual_effect_sizes_congruence %>%
filter(difference != 0) %>%
select(article, r_vahey, r_from_paper, difference) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| article | r_vahey | r_from_paper | difference |
|---|---|---|---|
| Carpenter et al. (2012) | 0.47 | 0.03 | -0.44 |
| Carpenter et al. (2012) | 0.23 | 0.19 | -0.04 |
| Carpenter et al. (2012) | 0.40 | 0.18 | -0.22 |
| Nicholson, Dempsey et al. (2014) | 0.22 | 0.23 | 0.01 |
| Nicholson, McCourt et al. (2013) | 0.50 | 0.34 | -0.16 |
| Parling et al. (2012) | 0.25 | 0.24 | -0.01 |
| Remue, et al. (2013) | 0.50 | 0.37 | -0.13 |
| Remue, et al. (2013) | 0.37 | 0.50 | 0.13 |
| Remue, et al. (2013) | 0.58 | 0.39 | -0.19 |
| Remue, et al. (2013) | 0.38 | 0.23 | -0.15 |
| Timko et al. (2010; Study 1) | 0.45 | 0.42 | -0.03 |
| Vahey et al. (2009) | 0.44 | 0.46 | 0.02 |
| Vahey et al. (2010) | 0.25 | 0.26 | 0.01 |
Table includes those that differed.
Vahey et al (2015) stated: “To be included within the current meta-analysis a given statistical effect must have described the co-variation of an IRAP effect with a corresponding clinically-focused criterion variable. To qualify as clinically-focused, the IRAP and criterion variables must have been deemed to target some aspect of a condition included in a major psychiatric diagnostic scheme such as the Diagnostic and Statistical Manual of Mental Disorders (DSM-5, 2013).”
Vahey et al.’s own inclusion criteria state that they would include, as evidence of criterion validity, effects that describe co-variation between IRAP scores and an external variable. However, many of their included effect sizes involve no external variable, only the presence of a non-zero IRAP effect at the group level. IRAP effects without reference to an external variable that cannot provide criterion validity, just as a group’s mean score on a self-report cannot in isolation provide evidence for that measure’s criterion validity.
Number and proportion of Vahey et al.’s 56 effect sizes from their supplementary materials that cannot provide evidence of criterion validity:
data_reextracted_and_annotated %>%
filter(es_included_in_vahey_meta == TRUE) %>%
mutate(non_criterion_validity_effect = specific_criterion_variable_type == "IRAP effect significance from zero test") %>%
summarize(non_criterion_validity_effect_count = sum(non_criterion_validity_effect),
non_criterion_validity_effect_proportion = mean(non_criterion_validity_effect)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| non_criterion_validity_effect_count | non_criterion_validity_effect_proportion |
|---|---|
| 21 | 0.375 |
Evidence from Vahey et al.’s choices of what to include vs not is at times hard to defend. Eg vahey et al 2009; comparisons between Nicholson et al. studies.
many effects that would seem to meet the criterion of being clinically relevant, but which were smaller in magnitude
EXPAND WITH EXAMPLES
Vahey et al. reported extracting 56 effect sizes prior to exclusions: “Collectively, these 15 articles yielded 56 statistical effects between various clinically focused IRAP effects and their respective criterion variables.” (p.60)
data_reextracted <- read.csv("../data/data_reextracted_and_annotated.csv") %>%
# mark each effect as clinically relevant only if both raters scored it as such
rowwise() %>%
mutate(clinically_relevant = as.logical(max(c(clinically_relevant_criterion_rater_1,
clinically_relevant_criterion_rater_2)))) %>%
ungroup()
total_n <- data_reextracted %>%
distinct(article, n_from_paper) %>%
summarize(total_n = sum(n_from_paper)) %>%
pull(total_n)I reextracted effect sizes from these same 15 articles and found 322 effect sizes.
Vahey et al. reported 100% inter rater reliability for the scoring of the suitability of effect sizes for inclusion: “From the outset, there was no disagreement between the authors about what statistical effects should be excluded from the meta-analysis; and of the 56 statistical effects only 8 (i.e. 14%) were not initially cited by both authors for inclusion.” (p.60)
They do not provide information about the excluded 8 effect sizes.
Vahey et al stated that “The authors decided whether the responses measured by a given IRAP trial-type should co-vary with a specific criterion variable by consulting the relevant empirical literature. In the absence of such evidence, the authors strictly excluded even substantial statistical effects between IRAP effects and accompanying variables from the current meta-analysis.” Note however that Vahey et al. did not provide citations of relevant literature that support the inclusion of any given effect. As such, this high rate of agreement between Vahey & Nicholson does not necessarily imply that the inclusion and exclusion criteria would achieve the same level of agreement by others.
The comprehensively extracted effect sizes were rated by two independent reviewers.
data_raters <- data_reextracted %>%
select(clinically_relevant_criterion_rater_1, clinically_relevant_criterion_rater_2)
data_raters %>%
mutate(agreement = ifelse(clinically_relevant_criterion_rater_1 == clinically_relevant_criterion_rater_2, TRUE, FALSE)) %>%
summarize(interrater_percent_agreement = round_half_up(mean(agreement, na.rm = TRUE), 1)) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| interrater_percent_agreement |
|---|
| 0.9 |
kappa2(data_raters, "unweighted")## Cohen's Kappa for 2 Raters (Weights: unweighted)
##
## Subjects = 281
## Raters = 2
## Kappa = 0.879
##
## z = 14.8
## p-value = 0
After exclusions, 144 effect sizes remained for inclusion in the new meta-analysis, compared to Vahey et al.’s 56.
Excluding non-clinically relevant and effects that do not involve a relationship with an external variable, which cannot provide evidence of criterion validity.
Details of the meta-analytic strategy, which used contemporary methods:
# exclusions and transformations
data_for_meta_new <- data_reextracted %>%
filter(problematic_analysis == FALSE & clinically_relevant == TRUE) %>%
mutate(variables = paste(irap_variable, criterion_variable, sep = "-")) %>%
dplyr::select(article, variables, r_from_paper, n_from_paper, authors_include_bh) %>%
na.omit() %>%
escalc(measure = "ZCOR",
ri = r_from_paper,
ni = n_from_paper,
data = .,
slab = paste(article, variables),
digits = 8) %>%
rename(ni = n_from_paper) %>%
dplyr::select(article, variables, yi, vi, ni, authors_include_bh) %>%
na.omit() %>%
group_by(article) %>%
mutate(es_number = row_number(),
article_abbreviated = paste(article, es_number)) %>%
ungroup() %>%
mutate(lower = yi - sqrt(vi)*1.96,
upper = yi + sqrt(vi)*1.96)
# fit
fit_new <-
rma.mv(yi = yi,
V = vi,
random = ~ 1 | article,
method = "REML",
data = data_for_meta_new,
slab = article_abbreviated)
# save to disk for pdf plots
#write_rds(fit_new, "models/fit_new.rds")
predictions_new <-
predict(fit_new) %>%
as.data.frame() %>%
select(-se)
predictions_new_backtransformed <- predictions_new %>%
mutate_all(transf.ztor) %>%
round_df(2)
# plot
data_forest_new <- data_for_meta_new %>%
drop_na() %>%
select(article_abbreviated,
n = ni,
r = yi,
lower,
upper) %>%
# bind_rows(tibble(article_abbreviated = "Meta",
# n = 35, # arbitrary number to make the diamond an appropriate size
# r = predictions_new$pred,
# lower = predictions_new$ci.lb,
# upper = predictions_new$ci.ub)) %>%
bind_rows(tibble(article_abbreviated = c("Meta (confidence interval)", "Meta (credibility interval)"),
n = 35, # arbitrary number to make the diamond an appropriate size
r = c(predictions_new$pred, predictions_new$pred),
lower = c(predictions_new$ci.lb, predictions_new$pi.lb),
upper = c(predictions_new$ci.ub, predictions_new$pi.ub))) %>%
mutate(r = transf.ztor(r),
lower = transf.ztor(lower),
upper = transf.ztor(upper)) %>%
mutate(size = n/sum(n),
n = ifelse(str_detect(article_abbreviated, "Meta"), total_n, n),
article_abbreviated = fct_relevel(article_abbreviated,
"Carpenter et al. (2012) 1", "Carpenter et al. (2012) 10",
"Carpenter et al. (2012) 11", "Carpenter et al. (2012) 12", "Carpenter et al. (2012) 13",
"Carpenter et al. (2012) 14", "Carpenter et al. (2012) 15", "Carpenter et al. (2012) 16",
"Carpenter et al. (2012) 17", "Carpenter et al. (2012) 18", "Carpenter et al. (2012) 19",
"Carpenter et al. (2012) 2", "Carpenter et al. (2012) 20", "Carpenter et al. (2012) 3",
"Carpenter et al. (2012) 4", "Carpenter et al. (2012) 5", "Carpenter et al. (2012) 6",
"Carpenter et al. (2012) 7", "Carpenter et al. (2012) 8", "Carpenter et al. (2012) 9",
"Dawson et al. (2009) 1", "Dawson et al. (2009) 2", "Dawson et al. (2009) 3",
"Dawson et al. (2009) 4", "Dawson et al. (2009) 5", "Dawson et al. (2009) 6",
"Hussey & Barnes-Holmes (2012) 1", "Hussey & Barnes-Holmes (2012) 10",
"Hussey & Barnes-Holmes (2012) 11", "Hussey & Barnes-Holmes (2012) 12",
"Hussey & Barnes-Holmes (2012) 13", "Hussey & Barnes-Holmes (2012) 14",
"Hussey & Barnes-Holmes (2012) 15", "Hussey & Barnes-Holmes (2012) 16",
"Hussey & Barnes-Holmes (2012) 17", "Hussey & Barnes-Holmes (2012) 18",
"Hussey & Barnes-Holmes (2012) 19", "Hussey & Barnes-Holmes (2012) 2",
"Hussey & Barnes-Holmes (2012) 20", "Hussey & Barnes-Holmes (2012) 21",
"Hussey & Barnes-Holmes (2012) 22", "Hussey & Barnes-Holmes (2012) 23",
"Hussey & Barnes-Holmes (2012) 24", "Hussey & Barnes-Holmes (2012) 25",
"Hussey & Barnes-Holmes (2012) 26", "Hussey & Barnes-Holmes (2012) 27",
"Hussey & Barnes-Holmes (2012) 28", "Hussey & Barnes-Holmes (2012) 29",
"Hussey & Barnes-Holmes (2012) 3", "Hussey & Barnes-Holmes (2012) 30",
"Hussey & Barnes-Holmes (2012) 4", "Hussey & Barnes-Holmes (2012) 5",
"Hussey & Barnes-Holmes (2012) 6", "Hussey & Barnes-Holmes (2012) 7",
"Hussey & Barnes-Holmes (2012) 8", "Hussey & Barnes-Holmes (2012) 9",
"Nicholson & Barnes-Holmes (2012b) 1", "Nicholson & Barnes-Holmes (2012b) 10",
"Nicholson & Barnes-Holmes (2012b) 2", "Nicholson & Barnes-Holmes (2012b) 3",
"Nicholson & Barnes-Holmes (2012b) 4", "Nicholson & Barnes-Holmes (2012b) 5",
"Nicholson & Barnes-Holmes (2012b) 6", "Nicholson & Barnes-Holmes (2012b) 7",
"Nicholson & Barnes-Holmes (2012b) 8", "Nicholson & Barnes-Holmes (2012b) 9",
"Nicholson, Dempsey et al. (2014) 1", "Nicholson, Dempsey et al. (2014) 10",
"Nicholson, Dempsey et al. (2014) 11", "Nicholson, Dempsey et al. (2014) 12",
"Nicholson, Dempsey et al. (2014) 13", "Nicholson, Dempsey et al. (2014) 14",
"Nicholson, Dempsey et al. (2014) 15", "Nicholson, Dempsey et al. (2014) 16",
"Nicholson, Dempsey et al. (2014) 17", "Nicholson, Dempsey et al. (2014) 18",
"Nicholson, Dempsey et al. (2014) 19", "Nicholson, Dempsey et al. (2014) 2",
"Nicholson, Dempsey et al. (2014) 20", "Nicholson, Dempsey et al. (2014) 21",
"Nicholson, Dempsey et al. (2014) 22", "Nicholson, Dempsey et al. (2014) 3",
"Nicholson, Dempsey et al. (2014) 4", "Nicholson, Dempsey et al. (2014) 5",
"Nicholson, Dempsey et al. (2014) 6", "Nicholson, Dempsey et al. (2014) 7",
"Nicholson, Dempsey et al. (2014) 8", "Nicholson, Dempsey et al. (2014) 9",
"Nicholson, McCourt et al. (2013) 1", "Nicholson, McCourt et al. (2013) 10",
"Nicholson, McCourt et al. (2013) 2", "Nicholson, McCourt et al. (2013) 3",
"Nicholson, McCourt et al. (2013) 4", "Nicholson, McCourt et al. (2013) 5",
"Nicholson, McCourt et al. (2013) 6", "Nicholson, McCourt et al. (2013) 7",
"Nicholson, McCourt et al. (2013) 8", "Nicholson, McCourt et al. (2013) 9",
"Timko et al. (2010; Study 1) 1", "Timko et al. (2010; Study 1) 10",
"Timko et al. (2010; Study 1) 11", "Timko et al. (2010; Study 1) 12",
"Timko et al. (2010; Study 1) 13", "Timko et al. (2010; Study 1) 14",
"Timko et al. (2010; Study 1) 15", "Timko et al. (2010; Study 1) 16",
"Timko et al. (2010; Study 1) 17", "Timko et al. (2010; Study 1) 18",
"Timko et al. (2010; Study 1) 19", "Timko et al. (2010; Study 1) 2",
"Timko et al. (2010; Study 1) 20", "Timko et al. (2010; Study 1) 21",
"Timko et al. (2010; Study 1) 22", "Timko et al. (2010; Study 1) 23",
"Timko et al. (2010; Study 1) 24", "Timko et al. (2010; Study 1) 25",
"Timko et al. (2010; Study 1) 26", "Timko et al. (2010; Study 1) 27",
"Timko et al. (2010; Study 1) 28", "Timko et al. (2010; Study 1) 29",
"Timko et al. (2010; Study 1) 3", "Timko et al. (2010; Study 1) 30",
"Timko et al. (2010; Study 1) 31", "Timko et al. (2010; Study 1) 32",
"Timko et al. (2010; Study 1) 4", "Timko et al. (2010; Study 1) 5",
"Timko et al. (2010; Study 1) 6", "Timko et al. (2010; Study 1) 7",
"Timko et al. (2010; Study 1) 8", "Timko et al. (2010; Study 1) 9",
"Timko et al. (2010; Study 2) 1", "Timko et al. (2010; Study 2) 10",
"Timko et al. (2010; Study 2) 11", "Timko et al. (2010; Study 2) 12",
"Timko et al. (2010; Study 2) 2", "Timko et al. (2010; Study 2) 3",
"Timko et al. (2010; Study 2) 4", "Timko et al. (2010; Study 2) 5",
"Timko et al. (2010; Study 2) 6", "Timko et al. (2010; Study 2) 7",
"Timko et al. (2010; Study 2) 8", "Timko et al. (2010; Study 2) 9",
#"Meta"
"Meta (confidence interval)",
"Meta (credibility interval)")) %>%
mutate(` ` = paste(rep(" ", 30), collapse = " ")) %>%
select(Article = article_abbreviated, ` `, r, lower, upper, n, size) %>%
mutate(r = round_half_up(r, 2),
lower = round_half_up(lower, 2),
upper = round_half_up(upper, 2))
p_new <-
forestploter::forest(data = select(data_forest_new, -size),
est = data_forest_new$r,
lower = data_forest_new$lower,
upper = data_forest_new$upper,
sizes = 1/data_forest_new$size,
# is_summary = c(rep(FALSE, nrow(data_forest_new)-1), TRUE),
is_summary = c(rep(FALSE, nrow(data_forest_new)-2), TRUE, TRUE),
ci_column = 2,
ref_line = 0,
theme = custom_theme,
xlim = c(-0.7, 0.9),
ticks_at = c(-.6, -.4, -.2, 0, .2, .4, .6, .8))
p_newggplot2::ggsave(filename = "plots/forest plot new multilevel model.pdf",
plot = p_new,
device = "pdf",
width = 7.5,
height = 32,
units = "in")Meta effect: k = 142, r = 0.18, 95% CI [0.1, 0.26], 95% CR [-0.04, 0.39], p = 0.00001.
Less appropriate one-tailed:
More appropriate two-tailed:
Incompatible with the idea that the IRAP can be used for clinical assessment. This would require that IRAP tells us something about individuals’ clinical status/group/location on a continuum/etc (i.e., IRAP effects -> individuals). Most of the effect sizes come from analyses that assume the opposite, i.e., that individuals’ clinical status/group/location on a continuum/etc. can tell us about their IRAP effects (i.e., individuals -> IRAP effects).
Even this new meta-analysis is mostly informative only for researchers wishing to study IRAP effects, rather than study individuals using the IRAP.
Similar conceptual criticisms have been made elsewhere. Schmall et al (2016) reported results from a very large study which concluded that hippocampal volume ‘robustly discriminate[d] MDD patients from healthy controls’ (p. 1). However, Fried & Kievit (2016) pointed out that this was not the case: depression status predicted a detectable difference in mean hippocampal volume, but hippocampal volume was very weakly associated with individuals’ depression status. They point out that Schmall et al had reversed their analysis’ DV and IV when making conclusions. This error is equally applicable to Vahey et al: even if their meta analysis was computationally replicable, its conclusions would still not support their claim that the meta analysis “demonstrates the potential of the IRAP as a tool for clinical assessment” (p. 64). Rather, it would demonstrate that known clinical status (etc) has potential as a tool for the assessment of IRAP scores.